Genomic analysis of Elsinoë arachidis reveals its potential pathogenic mechanism and the biosynthesis pathway of elsinochrome toxin

Elsinochromes (ESCs) are virulence factors produced by Elsinoë arachidis which is the cause of peanut scab. However, the biosynthesis pathway of ESCs in E. arachidis has not been elucidated and the potential pathogenic mechanism of E. arachidis is poorly understood. In this study, we report a high-quality genome sequence of E. arachidis. The size of the E. arachidis genome is 33.18Mb, which is comparable to the Ascomycota genome (average 36.91 Mb), encoding 9174 predicted genes. The self-detoxification family including transporters and cytochrome P450 enzymes were analysis, candidate effectors and cell wall degrading enzymes were investigated as the pathogenicity genes by using PHI and CAZy databases. Additionally, the E. arachidis genome contains 24 secondary metabolism gene clusters, in which ESCB1 was identified as the core gene of ESC biosynthesis. Taken together, the genome sequence of E. arachidis provides a new route to explore its potential pathogenic mechanism and the biosynthesis pathway of ESCs.


Introduction
Elsinoë arachidis is a phytopathogenic fungus that causes peanut scab on Arachis hypogaea Linn., resulting in tremendous yield loss (regional losses can be greater than 50%) in peanut planting regions in China [1,2]. Currently, disease occurrence patterns have been determined. However, the mechanism of host-pathogen interactions is largely unknown, indicating that new and effective prevention and control mechanisms of E. arachidis are urgently needed [3][4][5][6].
Interestingly, several Elsinoë produce elsinochromes (ESCs) [7], which are red, photosensitive, perylenequinone toxins. Previously, ESCs have been shown to promote electrolyte leakage, peroxidation of the plasma membrane, and production of reactive oxygen species such as superoxide (O 2 -). Additionally, ESCs contribute to pathogenesis and are essential for full virulence which was validated by constructing mutants in E. fawcettii of a polyketide synthaseencoding gene which is the core gene of ESC biosynthesis [8][9][10]. Cercosporin (Cercospora spp.) is the most well-known member of the group of perylenequinone fungal toxins. The biological functions and biosynthetic pathway of cercosporin have been clarified. Like many toxins identified in ascomycete fungi, its metabolic pathway is dependent on polyketide synthase (PKS) [11], and the other gene functions in the PKS gene clusters have also been determined. However, the biosynthetic pathway of ESCs in E. arachidis and their potential pathogenic mechanism remain to be explored. For instance, it is unclear whether, in addition to ESCs, there exist cell wall degrading enzymes or effectors that act as virulence factors in E. arachidis [12]. A growing number of studies have applied genome sequencing technology to the study of phytopathogenic fungi, such as Magnaporthe oryzae [13], Fusarium graminearum [14], Sclerotinia sclerotiorum and Botrytis cinerea [15], which has provided new research avenues for a better understanding of their genetic evolution, secondary metabolism, and pathogenic mechanisms.
The present study was aimed at exploring the possible virulence factors of E. arachidis during host invasion. We report on the 33.18Mb genome sequence of E. arachidis, the secondary metabolism gene cluster, and the discovery of 6 PKS gene clusters in E. arachidis including the ESC biosynthetic gene cluster and the core gene ESCB1. Through our analysis of the whole genome, we show that E. arachidis has a complex pathogenesis, with, in addition to the toxin, several candidate virulence factors including effectors, enzymes, and transporters. Moreover, the putative pathogenicity genes provide new horizons to unravel the pathogenic mechanism of E. arachidis.

Whole-genome sequencing and assembly
In this paper, we used E. arachidis strain LNFT-H01, which was purified by single spores and cultured on potato dextrose agar (PDA) under 5 microeinstein (μE) m -2 s -1 . The genome of LNFT-H01 was sequenced by PacBio RS II using a 20kb library of LNFT-H01 genomic DNA under 100 ×sequencing depth and assembled by Canu [16][17][18]. The assembled whole-genome sequence, totaling 33.18 Mb and containing 16 scaffolds, was submitted to NCBI (GenBank accession JAAPAX000000000). The characteristics of the genome were mapped in a circus-plot.

Phylogenetic and syntenic analysis
The evolutionary history can be deduced from conserved sequences and conserved biochemical functions. In addition, clustering the orthologous genes of different genomes can be helpful to integrate the information of conserved gene families and biological processes. We calculated the closest relatives to sequences from E. arachidis within reference genomes by OrthoMCL, then constructed a phylogenetic tree by SMS implemented in the PhyML (http://www.atgcmontpellier.fr/ phyml-sms/) [19,20]. Syntenic regions between E. arachidis and E. australis were analyzed using MCScanX, which can effectively determine the changes in chromosome structure and reveal the history of the gene family expansion [21].

Repetitive sequence
Due to the low conservation of repetitive sequence (RS) between species according to MITE Hunter, LTR FINDER, Repeat Scout, and PILER [22][23][24][25], we exploited the genome sequence to established a RS database, classified and merged by PASTEClassifier and Repbase [26,27]. Finally, we predicted the repetitive sequences with RepeatMasker [28].

Gene prediction and annotation
The ab initio-based and homology-based methods were performed to predict gene numbers in the E. arachidis genome. A combination of Augustus, Glimmer HMM, Genscan GeneID, and SNAP [29][30][31][32] homology-based methods were used by GeMoMa [33] and the results were integrated using EVM [34]. Non-coding RNA including rRNA, tRNA, and other RNAs were also classified and analyzed. According to the structural characteristics of different non-coding RNAs, different strategies were used to predict different non-coding RNAs. Based on the Rfam [35] database, Blastn [36] was used to identify rRNA. We used tRNAscan-SE [37] to identify tRNA. As for the pseudogenes, which have similar sequences to functional genes but have lost their original functions due to mutations, we searched for homologous sequences in the genome through BLAT [38] alignment, and we then used GeneWise [39] to search for immature stop codons and frameshift mutations in the gene sequence to obtain pseudogenes. The preliminary functional annotation was conducted with multiple databases, including the Pfam, NR, KOG/COG, KEGG, and GO databases [40][41][42][43]. The pathogen-host interaction (PHI) database, carbohydrate-active enzymes (CAZy) database, and transporter classification database (TCDB) were used to identify potential virulence-related proteins [44][45][46].

Identification and characterization of polyketide synthases (PKSs) and secondary metabolite clusters
Secondary metabolite clusters were predicted by performing antiSMASH2 (https:// fungismash.Secondarymetabolites.org). In order to confirm the function of polyketide synthase (PKS), which is the core protein that responsible for the biosynthesis of mycotoxin in different organisms, PKS sequences were used to construct the phylogenetic tree by MEGA 10.0.5. The detailed information on PKS is reported in S9 Table. Domains of PKSs were identified via InterPro (https://www.ebi.ac.uk/interpro) and their location visualized by DOG 2.0.

ESCB1 expression and toxin determination
Elsinochrome extraction and quantitation were performed as previously described [12]. As for ESCB1 expression, the strain used for the colony culture was the same as for toxin extraction. Total RNA extraction was done using TransZol TM Up Plus RNA kit (Beijing, TransGen Biotech). RT-PCR was performed using TransScript1 One-Step gDNA Removal and cDNA Synthesis (Beijing, TransGen Biotech). qPCR was done using SuperMix TransStart1 Green qPCR SuperMix with primers ESCB1F (ATCCGAGGTCATTGGTGATG) and ESCB1R (GAGGTTGACATCTGGC ATTTG).

The characteristics of the whole-genome
Whole genome sequencing of E. arachidis was performed using PacBio RS II (100×coverage). A total of 6.28 Gb high-quality sequencing raw data were assembled by CANU into 16 scaffolds (N50, 3,376,838bp) and the characteristics that are displayed in a circus-plot (Fig 1). We analyzed the genome sequence through Augustus [29] and we identified 7,950 genes. In order to obtain accurate information, we further performed a combination of Glimmer HMM (9,277), Genscan (6,599), GeneID (11,100), and SNAP (10,175) [30][31][32]. By homology-based methods using GeMoMa [33], taking E. australis as a reference genome, 8,339 genes were predicted. The above results were integrated by EVM [34] showing that the E. arachidis genome contains 9,174 genes (Table 1). KOG, KEGG, and GO annotation were in S1  Table).

Genes associated with detoxification
Transporters. Transporters are membrane-associated proteins that can assist the movement of ions, amino acids, and macromolecules across the membrane, which plays an important role in a broad range of cellular activities such as nutrient uptake, the release of secondary metabolites, and signal transduction [48]. The major facilitator superfamily (MFS) and ATPbinding cassette (ABC) transporter superfamily are the two largest families of fungal transporters [48]. Among these, the ABC transporters are the primary active transporters, usually as part of multicomponent transporters, that transport different compounds including polysaccharides, heavy metals, oligopeptides, and inorganic ions. In addition, MFS transporters are secondary carriers that facilitate the secretion of endogenous fungal toxins, such as aflatoxins, ; the fourth circle is repeated sequence; the fifth circle is tRNA and rRNA (blue: tRNA, purple: rRNA); the sixth circle is GC content (light yellow: the GC content is higher than the average GC content, blue: the GC content is lower than the average GC content); the innermost circle is GC-skew (dark gray: the G content is greater than C, red: the C content is greater than G). trichothecenes, and cercosporin. A large number of ABC genes (57) and MFS genes (190) were found in E. arachidis (S2 Table), which represents 57% of the total number of transporters (Fig 2A) (Table 2), which encodes the MFS transporter, and are located in the cercosporin biosynthetic gene cluster. They play a role in the secretion of cercosporin in Cercospora nicotianae and are involved in cercosporin resistance [49]. ESC, biosynthesized by E. arachidis, produces reactive oxygen species in the light acting on the cell membranes and destroying the cellular structure. Meanwhile, E. arachidis can grow and develop in the presence of high concentrations of reactive oxygen species, which suggesting the certain detoxification of E. arachidis.
The ABC and MFS transporters may play functional roles in the secretion of toxins and play an important role in the virulence toward the plant. Cytochrome P450. The cytochrome P450 enzymes (CYPs) are multifunctional oxidoreductases that can aid in the detoxification of natural and environmental pollutants, involved in the primary and secondary metabolism [50]. A total of 78 CYPs (S3 Table) were predicted in E. arachidis genome, of which 20 CYPs were analyzed in the PHI data (Table 2), mainly including the CYP51 and CYP52 families. The CYP51 families, the conserved fungal P450, are involved in the biosynthesis of membrane ergosterol. MoCYP51B and MoCYP51A both encode a sterol 14α-demethylase enzyme in M. oryzae that is required for conidiogenesis and mediating the action of sterol demethylation inhibitor (DMI) fungicides [51]. CYP52X1, a member of the CYP52 family, are involved in the degradation of specific epidermal lipid components in the insect waxy layer [52]. In general, the CYPs may be involved in the detoxification of the pathogen's own toxins.

Analyses of pathogenicity proteins encoded by the E. arachidis genome
Through the pathogen-host interaction database, 2,752 potential pathogenic genes were screened in E. arachidis (Fig 2B), mainly concerning the increased virulence and effectors, the loss of pathogenicity, and reduced virulence as shown in S4 Table. Effectors. During the interaction between pathogens and hosts, pathogens can produce different effector proteins to change the cell structure and metabolic pathways of the host plants, thereby promoting successful infection of the host plants or triggering host defense reactions. In total, 734 genes were predicted to code for secreted proteins in the E. arachidis genome. Analysis of the PHI database revealed 25 candidate effectors (Table 3) including EVM0006757.1, a gene homologous to PemG1, an elicitor-encoding gene of Magnaporthe oryzae which triggered the expression of phenylalanine ammonia-lyase gene [53] and EVM0003806, a gene homologous to glucanase inhibitor protein GPI1 [54] secreted by Phytophthora sojae, which inhibits the EGaseA mediated release of elicitor active glucan oligosaccharides from P. sojae cell wall. The function of candidate effectors from E. arachidis needs further testing and verification, but also provides a novel research direction for the elucidation of pathogenic mechanisms.
Carbohydrate-active enzymes. The cuticle and cell wall of plants are the primary barriers that prevent the invasion of pathogens. Therefore, the ability to degrade complex plant cell wall carbohydrates such as cellulose and pectin is an indispensable part of the fungal life cycle. The CAZymes secreted by pathogenic fungi are capable of degrading complex plant cell wall carbohydrates to simple monomers that can be used as carbon sources to help pathogen invasion [55]. Mapped E. arachidis genomes with CAZy database detected 602 genes potentially encoding CAZymes (S6 Table). Subsequently, we compared the CAZyme content to other ascomycetes including necrotrophic plant pathogens (S. sclerotiorum and B. cinerea), a biotrophic pathogen (B. graminis), and hemi-biotrophic pathogens (M. oryzae and F. graminearum) (Fig 2C, S7 Table). The CAZyme-content in E. arachidis is the largest in all compared fungi genomes. This suggests that the CAZymes content does not directly correlate with the lifestyle of the fungus. Further analysis showed, that the pectin and cellulase content of E. arachidis (39) was smaller than that of the necrotrophic plant pathogens S. sclerotiorum (53) and B. cinerea (62). However, it was significantly larger than that of B. graminis (2) (Fig 2D). In addition to cell wall degrading enzymes, different pathogens likely use different strategies to penetrate plant tissues.

Secondary metabolism
Gene clusters of PKS in E. arachidis. E. arachidis encodes 24 secondary metabolism clusters, including PKS (6), nonribosomal peptide synthetase (NRPS) (11), NRPS-PKS (1), terpene (6) (S3 Fig). The number of PKS clusters in E. arachidis were lower than in M. oryzae, similar to E. fawcettii and F. graminearum, but the number of NRPS clusters was twice that of E. fawcettii, indicating significant differences in metabolic pathways between E. fawcettii and E. arachidis (S4 Fig). We analyzed the PKS proteins from E. arachidis for conserved domains by InterProScan and visualized them using DOG 2.0. (Fig 3). We found that E. arachidis contains 8 different domains including KS, AT, TE, ER, KR, MeT, ACP, and DH. According to their domain structures, the 6 PKS genes could be further divided into reduced (EVM0002563, EVM0005988, EVM0006869) and non-reduced (EVM0003759, EVM0004732, EVM0005880) due to the reducing activity of ER and KR.
In order to further differentiate the 6 PKS genes, 19 different PKS genes were analyzed (S8 Table). Among the 6 PKS from E. arachidis, EVM0003759 was in the same clade as EaPKS which is encoding for ESC biosynthesis in E. australis and therefore we named it ESCB1 (Elsinochrome Biosynthesis gene 1). Interestingly, EVM0004732 and EVM0005880 are related to the biosynthesis of melanin (Fig 4). This is the first time that melanin has been predicted in this pathogen. Whether melanin in E. arachidis plays a role in pathogenicity as it does in M. oryzae by aiding to penetrate the host plant remains to be verified.

PLOS ONE
Potential pathogenic mechanism and the biosynthesis pathway of elsinochrome toxin EVM0007299 which encode O-methyltransferase, EVM0006582 and EVM0006794 similarity to MFS transporter, EVM0002495 Cytochrome P450, and EVM0002638 zinc finger transcription factor.

Discussion
Elsinoë species cause scab and spot anthracnose on various crops including peanut, cassava, citrus, mango, and grape. In this paper, the first whole genome sequence of E. arachidis were reported and revealed the complex gene structures that may be involved in its pathogenic mechanism. Additionally, we predicted the ESC toxin biosynthesis gene cluster. The genome size of E. arachidis is 33.18Mb, which was comparable in size to the Ascomycota genome size, however, compared with E. australis (23.34 Mb), E. arachidis has a larger genome size. This may be due to the lower proportion of repeat sequences in the E. fawcettii genome [56]. The GC content was 48.24% and CDSs percentage of the genome was 43.94%. Mycotoxins play an important part in the pathogenic mechanisms of pathogens. Mycotoxin ESCs, perylenequinones photosensitive toxins, can produce reactive oxygen species (ROS) and act on the cell membrane to destroy the cell structure. E. arachidis can maintain growth and development even in the presence of high toxin levels, which indicates an efficient self-detoxification mechanism. We identified ABC transporters and MFS transporters in E. arachidis indicating the complex transportation of substances in E. arachidis and that some of them may have an effect on the secretion of ESCs. Cytochrome P450 enzyme system, a multifunctional oxidoreductase, may involve in the self-detoxification of E. arachidis by providing redox conditions to maintain its own steady state for various physiological and biochemical reactions.
ESC is a crucial virulent factor in the pathogenic process of E. arachidis. However, compared with mycotoxins such as aflatoxins, fumonisin, and trichothecenes, and host-selective toxins such as T-toxin, still little is known about the biosynthetic pathways of perylenequinone mycotoxins. Cercosporin, the same group of perylenequinone toxins with ESC, has been proved that CTB1 (cercosporin synthase gene 1) which encoding polyketide synthase is the core gene of cercosporin biosynthesis pathway [10]. Efpks1 has been shown to function the in ESC biosynthesis in E. fawcettii, but the specific biosynthesis pathway still needs to be further clarified [8,9]. With the prediction of the secondary metabolism gene cluster of E. arachidis, 6 gene clusters related to polyketide synthase were obtained. The core genes were EVM0002563, EVM0003759, EVM0004732, EVM0005880, EVM0005988, and EVM0006869. Phylogenetic tree constructions showed that EVM0003759 is involved in ESCs synthesis, while EVM0004732 and EVM0005880 play a role in melanin synthesis. To our knowledge, this is the first time that melanin has been identified in E. arachidis. Interestingly, analysis of the position between the core genes of ESCs and melanin gene clusters, we found that the three genes are all located in Contig00003. This result also cast some doubt on whether PKS synthesis pathways from ESC and melanin are interrelated or competing.
Pathogens employ complex mechanisms to break through the defenses of plants, including toxins, enzymes, and other pathogenic factors to help invasion and colonization. Analysis of the CAZy and PHI databases revealed that, in addition to ESCs, enzymes, effectors, and certain transcription factors may be involved in the pathogenic process. Increased virulence factors (3%) that cause increased pathogenicity include O-methylsterigmatocystin oxidoreductase, AK-toxin biosynthetic gene 7 (AKT7) and bZIP transcription factor MeaB. EVM0005728, EVM0001699 and EVM0004784 are related to AKT7, which encodes a cytochrome P450 monooxygenase in Alternaria alternata and can limit the host-selective toxin AK-toxin production [57]. EVM0002472 is endowed with a basic leucine zipper (bZIP) domain similar to the MeaB transcription factor in Fusarium oxysporum [58], which activates a conserved nitrogen responsive pathway to control the virulence of plant pathogenic fungi (S5 Table).
In conclusion, we reported the whole-genome sequence of E. arachidis. Analysis of its assembly and annotation allowed the identification of the presumptive PKS gene clusters. Based on our results, we hypothesize that ESCB1 maybe the core gene of the biosynthesis of ESC. Additionally, pathogenic factors including CAZymes and effectors may help E. arachidis to circumvent the defense mechanisms of peanuts. Our work lays the foundation of future research aimed at elucidating the detailed pathogenic mechanisms of E. arachidis.

Conclusions
In conclusion, this is the first report of the high-quality genome of E. arachidis by PacBio RS II. The basic information of the sequence, gene family and metabolic gene cluster of E. arachidis were clarified. Through further analysis of the key genes in different PKS gene clusters, the expression of ESCB1 (EVM0003759) under light and dark condition was initially determined to participate in the ESC biosynthetic pathway, and the flanking sequences of this gene cluster were annotation, including major facilitator superfamily transporter, cytochrome P450, monooxygenase and O-methyltransferase. In addition to ESC toxins, genes related to mycotoxin biosynthesis such as melanin are also noted. This information provides new ideas for further exploration of the pathogenic mechanism of E. arachidis.